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In this paper we investigate undirected discrete graphical tree models when all the variables in 
the system are binary, where leaves represent the observable variables and where all the inner 
nodes are unobserved. A novel approach based on the theory of partially ordered sets allows us 
to obtain a convenient parametrization of this model class. The construction of the proposed 
coordinate system mirrors the combinatorial definition of cumulants. A simple product-like 
form of the resulting parametrization gives insight into identifiability issues associated with this 
model class. In particular, we provide necessary and sufficient conditions for such a model to be 
identified up to the switching of labels of the inner nodes. When these conditions hold, we give 
explicit formulas for the parameters of the model. Whenever the model fails to be identified, we 
use the new parametrization to describe the geometry of the unidentified parameter space. We 
illustrate these results using a simple example. 
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1. Introduction 

Discrete graphical models have become a very popular tool in the statistical analysis 
of multivariate problems (see, e.g., [7, 19]). When all the variables in the system are 
observed, they exhibit a useful modularity. In particular, it is possible to estimate all the 
conditional probabilities that parametrize such models, maximum likelihood estimates 
are simple sample proportions and a conjugate Bayesian analysis is straightforward. 
However, if the values of some of the variables arc unobserved, then the resulting model 
for the observed variables often becomes very complex, making inference much more 
difficult. 

The complicated structure of models with hidden variables usually leads to difficulties 
in establishing the identifiability of their parameters (see, e.g., [1]). In this paper, we show 
how algebraic and combinatorial techniques can help. We focus on graphical models where 
the underlying graph is a tree and all the inner nodes represent hidden variables. In the 



This is an electronic reprint of the original article published by the ISI/BS in Bernoulli, 
2012, Vol. 18, No. 1, 290-321. This reprint differs from the original in pagination and 
typographic detail. 



1350-7265 © 2012 ISI/BS 



2 



P. Zwiernik and J.Q. Smith 




H 



■•X 3 



X 2 



Figure 1. The tripod tree model. 

computational biology literature, these models are called the general Markov models (see, 
e.g., [14]), tree models or tree decomposable distributions (cf. [10]). Building on results 
of Chang [4] , in this paper we analyze issues associated with identifiability of such a tree 
model when all its variables are binary, paying particular attention to the geometry of 
the unidentified space. In particular, we obtain necessary and sufficient conditions for 
this model to be locally identified, which gives a stronger version of Theorem 4.1 in [4]. 
When these conditions are satisfied, we also obtain exact formulae for its parameters in 
terms of the marginal distribution over the observed variables. 

Our strategy is to define a new paramctrization of this model class. The new coordi- 
nate system is based on moments rather than conditional probabilities. This helps us to 
exploit various invariance properties of tree models, which, in turn, enables us to express 
the dependence structure implied by the tree more elegantly. Furthermore, because the 
paramctrization is based on well-understood moments, the implied dependence structure 
becomes more transparent. 

The motivation of this methodology sprung from the study of the tripod tree model, 
which is the simplest naive Bayes model. The model is a graphical model given in Figure 1 , 
where the black nodes represent three observed variables, X\,X2,X$, and the white 
node indicates a hidden variable H that remains hidden; that is, its values are never 
directly observed. We assume all the variables in the system have values in {0, 1}. For a = 
(01,02,03) € {0,1} 3 let p a =¥(Xi = ai, X2 = 02,^3 = 03). This model would usually 
be parametrized using conditional probabilities. In this case we would write 



where 9\ = P(H = i) and <T ^ = ¥(Xj = Oj \H = i). It can be seen that there are seven 
free parameters needed to specify p a , namely: 6^ together with 0H: for i — 0, 1 and 



However, the definition of this model given in (1) becomes more transparent when 
expressed in terms of moments. It is easy to check that there is a one-to-one corre- 
spondence between the probabilities p a for a G {0, l} 3 and the four central moments 
Hi, = E(Xj - \i){Xj - \j) for 1, j = 1,2,3 and Ml23 = E(X 1 - Ai)(X 2 - A 2 )(X 3 - A3) 
supplemented by the three means Ai =EXi for i = 1,2,3 (cf. Appendix A.l). 

Let fih = 1 — 28i , Jii = 1 — 2Ai and r]h,i = 0^1 — 9^1 for i = 1, 2, 3. We can now write 
an explicit isomorphism between the original seven parameters (O^ 1 ' ,(6^1,6^1)) and 
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3 = 1,2,3. 
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new parameters (p,h, (fti), (Vh^)) for i = 1,2,3. Thus, in [15], it is shown that in the new 
coordinate system, together with the new parameters, the model class is equivalently 
given by 

Aj = i(l-//i) fori = 1,2, 3, 

Mij = |(1 - fih)Vh,iVh,j for alH^j € {1,2,3} and (2) 

M123 = i(l - fi-l)PhVh,lVh,2Vh,3- 

The product-like form of this parametrization enables us to see various interesting 
constraints on the observed nodes. For example, by multiplying formulae for //12, Ati3 and 
fi-23 in (2) together we can see that /Ki2/^i3A<23 > must hold. It also allows us to find 
explicit formulae for the parameters of the model in terms of the marginal distribution 
on the set of observed variables. For example, when /ii2/ii3^23 7^ by substituting (2) 
for all the observed moments, we see that 



^123 2 _ /^123 + , . _ . _ „ 

fX h — -2 , Vh,i~ 2 10Il — L,Z,6. [6) 

Ml23 + 4^12^13/^23 fJ> jk 



Now a similar parametrization is known for general naive Baycsian models; see the 
Appendix in [6]. The new parametrization for this model class was used in [13] to ap- 
proximate a marginal likelihood where the sample size was large, in [6] to understand 
the local geometry of the model class and in [2] to provide the full description of these 
models in terms of the defining equations and inequalities. 

Naive Bayesian models are a particular example of general Markov models. The class 
of tree models is somewhat more complicated than the naive Bayesian models and needs 
new tools to examine its geometry. In this paper, we investigate the moment structures 
induced by tree models using the theory of partially ordered sets and Mobius functions. 
Similar methods were used in the combinatorial theory of cumulants (see [12, 17]) for 
a poset of all partitions of a finite set. To our knowledge, this paper is the first to use 
more general posets in statistical analysis, although a similar approach can be found in 
the theory of free probability (see, e.g., [18]). 

The paper is organized as follows. In Section 2 wc define and analyze the moment struc- 
tures of the class of models under consideration. In Section 3 we define tree-cumulants, 
which form a new coordinate system for this model class. In Section 4 we reparametrizc 
the model and show that the induced parametrization on the observed margin has an 
elegant product-like form. We apply this reparametrization in Section 5, analyzing the 
local geometry of the tree models and the geometry of the subsets of the parameter 
space that give the same set of marginal distributions on the set of observed variables. In 
Section 6 we illustrate this method using a simple general Markov model given by a tree 
with two hidden nodes. 



2. Independence models on trees 

In this section, we introduce models defined by global Markov properties on trees. 
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2.1. Preliminaries on trees 

A graph G is an ordered pair (V, E) consisting of a non-empty set V of nodes (or vertices) 
and a set E of edges, each of which is an element of V x V. An edge (u, v) £ E is directed 
if the pair (it,u) is ordered and we represent the edge by an arrow from u to v. If (it,i>) 
is not an ordered pair, then we say that (u, v) is an undirected edge. Graphs with only 
(un)directed edges are called (un)directed. If e = (u,v) is an edge of a graph G, then u 
and v are called adjacent and e is said to be incident with u and v. If i> G V, the degree 
of v is denoted by dcg(u), and is the number of edges incident with v. A path in a graph G 
is a sequence of nodes (v\,V2, • • • , Vk) such that, for all i = 1, . . . , k — 1, Vi and i^+i arc 
adjacent. If, in addition, v\ = Vf~, then the path is called a cycle. A graph is connected if 
each pair of nodes in G can be joined by a path. 

A (directed) tree T = (V,.E) is a connected (directed) graph with no cycles. A node 
of T of degree one is called a leaf. A node of T that is not a leaf is called an inner node. 
An edge e of T is inner if both nodes incident with e are inner nodes. A connected 
subgraph of T is a subtree of T. A rooted tree, T r , is a directed tree that has one 
distinguished node called the root, denoted by the letter r, and edges that are directed 
away from r. Let T r be a rooted tree. For every node v of T r we let pa(u) denote the 
set of nodes u such that (u,v) g £\ If v is the root, then pa(i') = 0. Otherwise pa(u) is 
a singleton. 

For any VF C 1/ we define T(W) as the minimal subtree of T whose set of nodes 
contains W . We say that T(W) is the subtree of T spanned on W . Henceforth, denote 
the edge set of T(W) by E(W) and its set of nodes by V(W). If T is rooted, then let r(W) 
denote the unique node v of T(W) such that pa(w) (~l V(W) is the empty set. 

Let T = (V, E) be a tree where e = (u, v) denotes one of its edges. Then contracting e 
results in another tree, denoted by T/e, with the edge e removed and its incident nodes u 
and v identified. Similarly, for any E' C E we denote the tree obtained from T by con- 
tracting all edges in E' by T/E' . If v 6 V such that degw = 2, then to suppress v we 
simply contract one of the edges incident with v. The resulting tree is denoted by T/v. 

2.2. Models defined by global Markov properties 

In this paper, we always assume that random variables are binary, taking either value 
or 1. The vector Y has as its components all variables in the graphical model, that is, 
both hidden and observed variables. Denote the subvector of Y of observed variables 
by X and the subvector of hidden variables by H . 

Let T = (V, E) be an undirected tree. For any three disjoint subsets A,B,CCV we 
say that C separates A and B in T, denoted by A B\C , if each path from a node in A 
to a node B passes through a node in C . For any A C V let Ya denote the subvector of 
Y = (Y v ) v£ v with elements indexed by A, that is, Ya = {Y v ) v£ a- We are interested in 
statistical models for Y defined by global Markov properties (GMP) on T. By definition 
(sec, e.g., [7], Section 3.2.1), these models are specified through the set of conditional 
independence statements of the form: 



{Y A JLY B \Y C : for all A,B,C CV s.t. A± T B\C}. 



(4) 
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Let Mt denote the space of probability distributions of (X,H) satisfying the global 
Markov properties on T. We now let Mx denote the space of marginal probability 
distribution on X induced from distributions over (X,H), which are in M.x- 

2.3. Models for rooted trees 

We next present the parametric formulation of the models presented in the previous 
section. A Markov process on a rooted tree T r is a collection of random variables, 
{Y v : v £ V}, such that for each a = (a v ) ve v £ {0, l} y 

a»W=n£i«w (5) 

vev 

where pa(r) is the empty set, 6 — (#^ a ( ) and 

Since 6*^' +0j 7 ' = 1 and +0^j = 1 for all v 6 V \ {r} and i = 0, 1, the set of parameters 
consists of exactly 2\E\ + 1 free parameters: we have two parameters, 0^2, for each 
edge (u,v) £ E and one parameter, 9^ , for the root. We denote the parameter space by 

e T = [o,i] 2 i £ ! +1 . 

Suppose that T r has n leaves representing a binary random vector, X = (X\, . . . , X n ), 
and let 

A 2n _ 1 = jpeR 2 ": ^p /3 = l,p,3>o] (6) 

with indices /3 ranging over {0, 1}™ be the probability simplex of all possible distributions 
of X . Equation (5) induces a polynomial map, fx ■ 6t — > A2»_i, obtained by marginal- 
ization over all the inner nodes of T, giving the marginal mass function pp(&) as 

U v&V 

Here, H denotes the set of all a € {0,1} V such that the restriction to the leaves of T 
is equal to p. The image of this map is, by definition, the general Markov model on T r 
(cf. [14], Section 8.3, [10]). 

Standard theory in graphical models tells us that the Markov process on T r is equal 
to Mx and, consequently, that the general Markov on T r model is equal to Mx- Indeed, 
since T r is a perfect directed graph (see Section 2.1.3 in [7]), by [7], Theorem 3.28, 
the Markov properties are equivalent to the factorization with respect to the undirected 
version of T r , which is just T. Since T is decomposable, by [7], Proposition 3.19, the 
factorization according to T is equivalent to the global Markov properties on T . 
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In this paper, we often focus on trivalent trees, that is, trees such that every inner 
node has degree three. This is an important subclass because, by the well-known lemma 
below (see, e.g., [10], Section 2), the nodes of valency two in a given tree add nothing to 
the model class A4t- 

Lemma 2.1. Let T be a tree. Let v dV be a node of degree two and let T' = T/v be the 
tree obtained from T by suppressing v. Then P 6 AAt if cind only if P £ A4t' ■ 

Corollary 2.2. Let T be a tree and let i,j,k be any three leaves of T . The marginal 
model on (Xi,Xj,Xk) induced from A4t and denoted by M.T(ijk) is equivalent to the 
tripod tree model where the tripod tree is given in Figure 1. 

In addition, the model corresponding to any tree is a submodel of a model correspond- 
ing to a trivalent tree. To show this, we need the following definition. 

Definition 2.3. Let T be any tree. A trivalent expansion of T , denoted by T* , is any 
tree T* = (V*,E*) whose each inner node has degree at most three and there exists a set 
of inner nodes E' C E* such that T = T*/E' . 

Lemma 2.4. Let T be a tree and T* = (V*,E*) its trivalent expansion with E' C E* 
such that T = T* / E' . Then M T Q M T - ■ 

Proof. Let p be a point in M.t- Then p = fr{9) for some £ @t- Identifying edges of T* 
and T in the obvious way, we can write E* = E' U E. Define 8* G 0t* as follows. For all 
a u ,a v e {0, 1} 

A* (») = M f ( u ,v)eE, 

a v \a u a v \a u j \ i / j 

J (8) 

*a„ K = ^ a " Q " fOT GVery ^ G E>: 

where Stj denotes the Kronecker's delta. It is now simple to check that /t*(^*) =P- It 
follows that p G M.T- ■ □ 

For these reasons, we can usually safely restrict our attention to trivalent trees. 



2.4. Moments and conditional independence 

Let X = (Xi, . . . ,X n ) be a random vector and for each f3 = (fix, ...,f3 n ) eN" denote 
X? = Ui x ^- We shall denote EX 13 by and by ftp, where Ui = X t - El,. 

When j3 € {0,1}™, it is often convenient to use an alternate notation. Thus, for subsets 
7C [n] := {1,2, ...,n}, we let \ r =E(]] l&I X^, m = E(J[ ieI Ui). Note that \ e ., where e, 
is the standard basis vector in R™, can also be denoted by Xi for i = 1, . . . , n. 

The model A4t in the previous section is given in terms of the probabilities as the 
image of the map in (7). We find it convenient to change these coordinates. Let [n]> 2 
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denote all subsets of [n] with at least two elements. Denote by C n the set of values of all 
the means Ai, . . . , A„ together with central moments \ii such that / G l n ]>2 for all pos- 
sible probabilities in A2«— i- There exists a polynomial isomorphism, f pfJi : A 2 n_i — >C„, 
with the inverse denoted by / w (for details see Appendix A.l). Consequently, we can 
express any distribution in the general Markov model in terms of its central moments 
and means. 

For any two sets A, B let AB denote A U B. If Xa il- Xb, then fxu = Hifij for all 
non-empty I C. A, JCB. However, when all variables are binary, we also have a converse 
result. Thus, if for all non-empty I C A, J C B we have that /i/j = then Xa JL Xb- 

Indeed, the independence expressed in terms of moments (see, e.g., Feller [5], page 136) 
gives 

X A ALX B Cov{f(X A ),g(X B )) = for all / e L 2 (X A ),g e L 2 {X B ). (9) 

Since our variables are binary, all the functions of Xa and Xb are just polynomials 
with square-free monomials. Equivalently, every function of Xa or Xb can be written 
as a polynomial with square-free monomials in Ua or Ub, respectively. For instance, 
because Xi,X 2 £ {0,1}, 

Xl°X$ = X X X 2 = (Ui +\i)(U 2 + \ 2 ) = UiU 2 + \ 2 Ui + Ait/a + AiA 2 . 

Since the covariance is a bilinear form, Settimi and Smith [16] concluded that the inde- 
pendence can be checked only on these monomials and (9) can be rewritten as 

Xa -ii- Xb ><=> Cov(U%,U^) = for all a € {0,l}' /l ',/3 G {0,1}^. (10) 

However, Cov(U A , Ug) = holds for each non-zero a £ {0,1}I A I and f3e {0,1}I S I if and 
only if [lu — jiiiij for each I C. A, J C B. 

We can generalize the result above. For a random variable H a let A a = KH a and 
U a = H a — X a . For each / C [n] let Ui = Ui and 

T laJ =E(U I U a )/Var(H a ). (11) 

Note that under this notation Var(iJ Q ) = A a (l — A a ). 

Proposition 2.5. Let H a be a non-degenerate random variable. With the notation above, 
we have Xa -LL Xb\H ci if and only if for all non-empty I C A, J C B 

Hl.J = LlltlJ + A a (l - \a)Va.ir)a,J, 

(12) 

Va.IJ = lJtl11a,J + Va.lHJ + (1 - ^a)Va,lVa,J ■ 

Proof. The definition of independence given in (10) induces a condition for Xa -LL 
XB\H a . Thus, for each /C4, J C B we have 



Cav{U u Uj\H a = 0) = Cov([/ 7 , [/j|i/ a = 1) = 0, 



(13) 



8 



P. Zwiernik and J.Q. Smith 



so, in particular 



A a Cov(E7>, Uj\H a = 1) + (1 - Xa) Cav(Ui, Uj\H a = 0) = 0, 
Cav(Ui, Uj\H a = 0) - Cav(Z7j, Uj\H a = 1) = 0. 



(14) 



Moreover, for any I C [n], one has M(Ui\H a ) = fxi + r) a ,iU a , and hence 

Cov(Ul,Uj\H a ) =llu - plllXj + (jjajj - TjaJHJ - Hir] ai j)U a ~ r]ajVa,jU^. (15) 



3. Tree posets and tree cumulants 

In this section, we use the theory of partially ordered sets to propose a further change 
of coordinates. In the new coordinate system it is possible to parametrize the marginal 
model M.t m a product form (see Proposition 4.1) in contrast to the complicated poly- 
nomial mapping given in (7). 

3.1. The poset of edge partitions 

Let T = ( V, E) be a tree with n leaves. We identify the set of leaves of T with the set [n] . 
For any e G E we let T\e denote the forest obtained from T by removing e, that is, the 
subgraph of T given as a collection of disjoint trees with the set of nodes given by V 
and the set of edges given by E\e. Similarly, for any E' C E, we let T \E' denote the 
forest obtained by removing all the edges in E' . An edge split is a partition of the set 
of leaves, [n] , of T into two non-empty sets induced by removing an edge e from E and 
restricting [n] to the connected components of T \ e. By an edge partition, we mean any 
partition Bi\B2 \ ■ ■ • \Bj~ of the set of leaves induced by considering connected components 
of T \ E' for some E' C E, Call each subset E>i in this partition a block. 

Henceforth let Hr denote the poset of all edge partitions of the set of leaves induced 
by edges of T. The ordering is induced from the ordering of the poset of all partitions of 
the set of leaves (see [20], Example 3.1.1.d). Thus, for two partitions, tt = B\ \ ■ ■ ■ \ Bk and 
v = C\\---\Ci, we write 7r < v if every block of 7r is contained in one of the blocks of v. 
To make this more explicit, define the following equivalence relation on the subsets of E. 
For E\ , E<i C E we say E\ ~ Ei if and only if removing E\ induces the same partition of 
the set of leaves [n] as removing E^. For example, in Figure 1 the partition, 1|2|3, can be 
obtained either by removing any two edges or by removing all them. However, the only 
way to obtain the partition, 12 13, is by removing the edge incident with the third leaf. 

Let En denote the element of the equivalence class of subsets of E inducing the parti- 
tion 7r, which is maximal with respect to inclusion. Suppose that n £ Ht is obtained by 
removing edges in the subset of the set of edges E„ and v G is obtained by removing 
edges in E v . Write 7r < v if and only if E„ 3 E v and call 7r a subpartition of v. 

An interval, [tt, v], for n and v in Hp, is the set of all elements S such that n < S < v. 
The poset Hr forms a lattice (cf. [20], Section 3.3). To show this, we define tt V v G 



Equation (12) now follows from substituting (15) into (14). 



□ 
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(tt A v e IIt) as an element in Ut obtained by removing E v n E v (E„ U E v ). We have 
ttV v>tt, tt\/ v>v (tt A v < tt , 7rA^<^) and, if there exists another <5 S Hr such that 
<5 > 7r, (5 > ^ (5 < tt, S < v), then i5 > tt V v (5 < tt A v). The element 7r V v (tt Av) is called 
the join (the meet) of 7r and za The poset has a unique minimal element, 1|2| • ■ ■ |n, 
induced by removing all edges in E and the maximal one with no edges removed, which 
is equal to a single block, [n\. The maximal and minimal element of a lattice will be 
denoted by 1 and 0, respectively. 

The number of elements in these posets is typically large. However, the key concepts 
can be presented using a simpler poset. Let TIt denote a subposet of 11^ containing 
partitions obtained by removing only inner edges and consider, for example, the two 
different trivalcnt trees T and T", both with six leaves, given below 




Their associated posets, Ht and Ht> , are, respectively, 



123456 123456 




121314156 

So, for example, 12 34 1 56 is an edge partition in 11^ and is a subpartition of any other 
edge partition v G IIt- It can be obtained by removing either any two inner edges from 
(a, 6), (6, c) and (6, d), or all of them. Since, for tt = 12|34|56, there are no subpartitions 
of tt, it follows that tt is the minimal element of IIt- In II<r', there is only one way 
to obtain this partition. Namely, by removing (a, b) and (c, d). However, note that this 
partition is not minimal in Hp' because 12|3|4|56 < tt. 

For any poset n a Mobius function mn:n x n — » R is defined by mn(7r,7r) = 1 for 
every tt £ H, mn(7r, v) = — J2tt<s<u m n(7!", 8) for tt < v in n and is zero otherwise (cf. [20], 
Section 3.7). Recall that for any W CV, T(W) denotes the subtree of T spanned on W 
(see Section 2.1). We denote mn T(w) := xnw and mn T := tn, and let 0w and lw denote 
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the minimal and the maximal element of Ht(w)> respectively. For any partition ir 6 
ILt the interval [0,7r] has a natural structure of a product of posets for blocks of ir, 
namely Yls^^TiB), where the product is over all blocks B of ir. By Proposition 3.8.2 
in [20], the Mobius function on the product of posets Yl Be7r H T (B) can be written as 
a product of Mobius functions for each of the posets Ht(b)- Thus, for v < ir in Ht 

nv(f,7r) = JJ m B (v Bl is), (16) 

where i/# G ^t(b) is the restriction of v € Ht to the block containing only elements from 
B C [n] (it is well defined since v < it) and ir B = Is for each £>. 

In the next section, we will use the Mobius function of the poset of tree partitions to 
derive a useful change of coordinates on Mt- 

3.2. An induced change of coordinates 

Assume that each inner node of T has degree at most three and consider a map, f^ K : R n x 
M 2 — > R™ x R 2 , where the coordinates in the domain are denoted by Ai, . . . , A„ and hi 
for I C [n] and the coordinates in the image are denoted by Ai, . . . , A„ and kj for / C [n]. 
The map is defined as the identity on the first n coordinates corresponding to the means 
and 

kj = ^ 1/) II Ms for a U ^C[n], (17) 

It is easy to prove that the Jacobian of / MK is equal to 1, so, in particular, this is constant. 
To see this, order the variables in such a way that the first n coordinates both in /Ct 
and C n are Ai,...,A n and let kj precede kj (fii precede as long as / C J. The 
Jacobian matrix of f^ K is then lower triangular with each of its diagonal entries equal 
to 1. It follows that the modulus of its determinant is always 1. 

The map, / M K , is a regular polynomial map with a regular polynomial inverse / K ^. 
Therefore, it gives a change of coordinates from the central moments with means to 
a coordinate system given by Ai,...,A„ and k/ for I C [n]. Its inverse map is given 

by 

Hi= ^2 \ k b for all Ie [n]> 2 - (18) 

To show (18), define two functions on H T (i) ■ a(ir) = Yisen ^ B an< -l •^( 7r ) ~ T\Ben KB - F° r 
each 7T e n T(J) , by (17), 

/3M=II K B=n( E M"B,ifl) II Mo) 

=xi n m s(^' is) ji nc, 
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where v is an element of Ht(i) such that its restriction to each of the blocks B G tt is equal 
to vb- By the product formula in (16), we have risen- m B(^B, 1b) = VCli(y,'K). Therefore, 
= X^<7r m K !y ' 7r ) a ( !y ) f° r au 71 e n T (/). Equation (18) now follows on applying the 
Mobius inversion formula in Proposition 3.7.1 in [20]. 

Denote Kt = /^ K (C„). Since Kt is contained in a subset of R" x E 2 given by k = 
K\ = • • • = K n = 0, a system of coordinates on Kt is given by Xi for i = 1, . . . , n and ki 
for / G [w]>2- This system of coordinates is called iree cumulants. The name is justified 
by (17) because one of the definitions of classical cumulants is the following. Let II(/) 
denote the set of all partitions of 7 = {ii, ...,£&} G [n]>2 (see [20], Example 3.1.1.d). 
Then, for all k > 1 

Cum(X il ,...,X ik )= m n(/)(7r, i/) I| ( 19 ) 

7i-en(/) Be-* 

where the product is over all blocks of tt. Moreover, for every tt G 11(7) 

m n(7) ( 7 r,i / ) = (-l)l^- 1 (|^|-l)!, 

where \tt\ denotes the number of blocks in tt. Note that the usual definition of cumulants 
uses non-central moments instead of central moments in (19). It can be shown that both 
definitions are equivalent for all cumulants of order greater than one because the classical 
cumulants are translation invariant. The definition in (19) is thus essentially the same 
as (17) but with a different defining poset (cf. [12, 17]). 

Using a basic result in the theory of lattices, Lemma 3.2 shows that certain features 
of classical cumulants are also shared by tree cumulants (cf. Section 2.1 of [8]). 

Lemma 3.1 (Corollary in [11], Section 5). Let L be a finite lattice and let ttq^X 
in L. Then, for any v in L 

E «iM) = 0. 

7T A7To — V 

Lemma 3.2. Let T be a tree with n leaves. Whenever there exists an edge split C\\Ci G 
Ht of the set of leaves [n] such that Xq 1 -LL Xq 2 , then K\... n = 0. 

Proof. Let ttq be the split C\\C2 such that Xc x -U- Xc 2 - It follows that [i\... n is equal to 
MCiMcv More generally, for any L £ [n]>2, 

/!/ = fic 1 ni^c 2 m- 
Consequently, for any partition tt G IIt 

n^= n ^. (20) 

Using (17) and (20), we obtain 



«i-n= X! m ( 7r 'i)II^ B= m ( 7r 'l) II Vb- 

•n-eriT Be-rr 7ren T Be^Awo 
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Since 7r A ttq < ttq, by grouping all partitions ir € Ht giving the same partition, after 
taking the meet with ttq, we can rewrite the sum as 

«i... n = m(7T,i) J| vb=J2[ m ( 7r 'i)J II vb- 

However, this is zero since by Lemma 3.1 each of V . Tn(7T, 1) is zero. □ 

1 J Z— /7TA7T0— V ^ ' ' 

4. The induced parametrization 

We now define a new parameter space, fix, with |V| + \E\ parameters denoted by r] UtV 
for all [u, v) € E and /2„ for all v (^V. The map between the two parameter spaces is 
given by 

Vu v = Qui — for all (it, v) G E and 

(21) 

H v = 1 — 2A.„ for each ueV, 

where X v is a polynomial in the original parameters in Oy. The details are given in 
Appendix A. 2, where the inverse map is given by (36). It follows that the change of 
parameters between &t and f2y is a polynomial isomorphism. 

It can be checked that if Var(Y" u ) > 0, then r] U:V = K(U U U V )/ Var(Y" u ) is the regression 
coefficient of Y v on Y u . Therefore, r) UtV , defined above, coincides with the definition of rj UtV 
in (11). If Var(Y" u ) = 0, then the formula in (11) is not well defined; however, (21) always 
is. 

Proposition 4.1 below motivates the whole section and demonstrates why our new 
coordinate system is particularly useful. Henceforth let VW£ = (/^ o / pai )(A / (t) Q 

Proposition 4.1. Let T — (V, E) be a rooted tree with n leaves such that each inner node 
has degree at most three. Then M.^ is given as the image of tpT '■ ~* &t ■ Here tpx is 
defined by A.; = |(1 — jxi) for i = 1, . . . , n and 

Ki = \(l-tf (I) ) II ^° S(U) - 2 II *«,v /or each I g [n]> 2 , (22) 

oGV(J)\7 («,o)6B(/) 

where the degree is taken in T(I) = (V(I), E(I)) and r(I) denotes the root ofT(I) (cf. 
Section 2.1). 

The proof is given in Appendix B. 

By Lemma 2.4 we can obtain the parametrization of Mt for any non-trivalent tree 
T = {V,E) using a parametrization for its trivalent expansion T* = (V*,E*). Let E' be 
the subset of inner nodes of E* given in Definition 2.3, so that T* / E' = T. Let {V*} 
denote the equivalence classes of subsets of V* such that v ~ v' if and only if v becomes 
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identified with v' in T in the process of contracting E' in T* . There exists a natural 
identification of V with {V*}. Let {v} denote the equivalence class of v G V* or the 
corresponding node in T. In particular, since E' is a set of inner edges, the class {i} of 
every leaf i G [n] can be naturally identified with i and hence {V* \ [n]} — {V*} \ [n]. 

Lemma 4.2. Let T be any tree and T* be its trivalent expansion. If k\ for I G \p\>2 
are tree cumulants of T* , then M!^ is given in ICt* as the image of a map that is the 
identity on the coordinates corresponding to fa for i = 1, . . . , n and, for each I G [n]>2> 

k?=J(i-#(d) n ^° sw " 2 n ^ ( 23 ) 

oGV(I)\J («,«)£!(/) 

where T(I) = (V(I),E(I)) is the subtree of T spanned on I. 

Proof. By Lemma 2.4 and equation (8), A4t Q M.t* is the image /t*(©t), where Or 
is the subset of Ot* given by setting 6*^\ a = <5a„a„ for every edge (u,v) G E 1 and 

la = la °th crw isc- I n the ncw parameters, Ot is isomorphic to the subset of Ot* 
given by 

Vl, v = Vu,v for all (it, v)£E', 

rf UiV = 1 for all G £*' and (24) 

|U* = for all oel". 

Denote the root of T* by r* . We show (23) for I = [n]. The general case can be proved with 
an obvious change in notation. By Proposition 4.1, the model Mt* is parametrized by 

K\... n =\{i --»?.) n K dcs{v) - 2 n <«■ ( 2s ) 

«ev*\[n] 

Since £*=£U£' by applying (24), n (u ,„)ei?* becomes !](„,„) 6 E%,i>) where we 
have identified E with E* \ £". For every iu G V™, whenever deg{«;} > 3, we have that 
deg{u>} = \{w}\ + 2. Therefore, if deg{u>} > 3, then the degree of each v G {w} in T* 
equals 3. Hence 

J2 (degv-2)= l = IMI=degM-2. 

It follows that, after applying (24), Y\ v£ { w y P-l dcsv ~ 2 becomes M/^f/"^ 2 ' The last state- 
ment is also true if deg{u>} = 2. For, in this case, degw = 2 in T* and w is the only 
element in {w}. Moreover, E' is necessarily contained in the set of inner edges of T* . It 

follows that n„ev*\[ n ] p-v d ° s ^~ 2 in (25) becomes 

n /vi (M) ~ 2 = n ^ esW ~ 2 ' 

{w}<£{V*}\[n] D£V\[fl] 
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In addition, {r*} becomes the root of T denoted by r. Therefore, (25) becomes 

<..n=ja-^) n tf> 6(v) ~ 2 n ^ 

v£V\[n] (u,v)e.E 

which is exactly (23) for /= [n]. □ 

Remark 4-3. For every v £ V the variance Var(y„) is zero if and only if Jj^ = 1. Hence, 
in the case when p% < 1, the variable Y v is non-degenerate. In phylogcnctics it is usually 
assumed that p% < 1 for the root r of T and 77 M; „ ^ for all (it, v) £ (cf. Conditions (Ml) 
and (M2) in Section 8.2, [14]). It is shown in Section 8.2 in [14] that (Ml) and (M2) imply 
the weaker condition p% < 1 for all v £ V. Over the subset of f2y on which this weaker 
condition holds, we can apply another smooth transformation on both the parameter 
and model space. This leads to a further simplification of the parametrization in (22) 
presented in Appendix A. 3. 



5. Singularities and the geometry of unidentified 
subspaces 

The identifiability of general Markov models can be addressed here geometrically. For 
any q £ A4t the preimage ©t := fx 1 (q), that is, the set of parameter values that is 
consistent with the known probability model q, is called the q- fiber. In this section, we 
analyze the geometry of these fibers, determining when they are finite and thus when 
the model is locally identifiable. We will also be interested in when the fibers are smooth 
subsets of Ot and when they are singular. We use methods similar to the ones presented 
in a different context by Moulton and Steel in [9], Section 6. The results in this section 
generalize similar results for the naive Bayes models (cf. [6], Theorem 7). 

First we analyze the geometric description of Qt ■ This gives a set of implicit inequalities 
constraining each g-fibcr. Simple linear constraints defining Ot become only slightly more 
complicated when expressed in the new parameters. The choice of parameter values is 
not free anymore in the sense that the constraining equations for each of the parameters 
involve the values of other parameters. By (36), fix is given by p, r £ [— 1, 1] and for each 
(u,v) £ E 

- (1 + Mo) < (1 -p>u)Vu,v < (1 -fiv), 

-(1 - p, v ) < (l + p, u )Vu,v < (1 + AM- 

For p £ M.t let S = [/t^] £ R" xn be the covariance matrix of the observed variables 
labelled by the leaves ofJT computed with respect to p^We show that the geometry of 
the p- fiber, denoted by Ot, is determined by zeros in S. Let A, be the expected value 
of Xi. Then, for every point in the p- fiber, wc have fit = fii = 1 — 2\ for all i = 1, . . . ,n. 
Without loss we always assume that Aj(l — Ai) ^ (or, equivalently, that fif ^ 1) for all 

Z = 1, ... ,71. 

It is easier to analyze the geometry of p- fibers in J7t- Therefore transform O to £7t 
using the mapping fg u . The image of this map, denoted by SIt, is isomorphic to Ot- 
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Figure 2. An example of a tree and a sample covariance matrix. The dashed lines depict the 
edges isolated with respect to p. 

Let kij denote the corresponding second-order tree cumulants in the point f pK (p) . Since 
Kij = Hij for all i, j G [n], from (22) for any ljq = (77° v )) £ SIt we have that 

Aii=/*>o)=^(i-(M? W) ) 2 ) n <«■ ( 2? ) 

(ti,D)eB(ij) 

We say that that an edge, e £ E, is isolated relative to p if /tjj = for all i,j € [n] such 
that e € E(ij). We denote the set of all edges of T that are isolated relative to p by 
E C.E. Wc dchne the p-forcst T as the forest obtained from T by removing edges in E 
so that T = T\ E . Hence, the set of vertices of T is equal to the set of vertices of T and 
the set of edges is equal to E\ E. 

Wc illustrate this construction in the example below. Let T be the tree given in Figure 2 
and assume that the covariance matrix contains zeros given in the provided 7x7 matrix, 
where the asterisks mean any non-zero values such that the matrix is positive semidcfinitc. 
It can be checked that E = {(b, c), (c, d), (c, e), (e, 6), (e, 7)} and these edges are depicted 
as dashed lines. The forest, T, is obtained by removing the edges in E. 

We now define relations on E and E\E. For two edges, e, e', with either {e, e'} C E or 
{e, e'} C E\ E, write e ~ e' if either e = e' or e and e' are adjacent and all the edges that 
are incident with both e and e 1 are isolated relative to p. We now construct the transitive 
closure of ~ restricted to pairs of edges in E to form an equivalence relation on E. 
Consider a graph with nodes representing elements of E and put an edge between e, e' 
whenever e ~ e'. Then the equivalence classes correspond to connected components of 
this graph. In the same way, we take the transitive closure of ~ restricted to the pairs 
of edges in E \ E to form an equivalence relation in E\ E. We will let [E] and [E \ E] 
denote the set of equivalence classes of E and E\ E, respectively. For the tree from the 
example above, [E] is one element given by a subtree of T spanned on {6, d, 6,7} and 

[E\E] = {{(l,a)},{(2,a)},{(a,b),(b,3)},{(d,4),(d,5)}}- 

By construction, all the inner nodes of T have either degree zero in T or the degree 
is strictly greater than one. The following lemma shows that whenever the degree of an 
inner node in T is not zero, the node represents a non-degenerate random variable. 
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Lemma 5.1. Let p £ Mt- If v eV is an inner node of T such that deg(i>) > 2 in the 
p-forest T , then the variable H v cannot be degenerate. 

Proof. By construction, if deg(v) > 2 in T, then there exists i,j E [n] such that fay ^ 
and v lies on the path between i and j. Suppose that H v is degenerate. Then the global 
Markov properties in (4) imply that Xi _LL Xj. But then fay — and we obtain the 
contradiction. □ 

We now list some basic statements, partly based on Lemma 6.4 in [9], which follow 
directly definitions above. 

Remark 5.2. Let T = (V,E) be a tree with n leaves, let A4t be the corresponding 
general Markov model and suppose that p £ A4t- 

(i) The edges in any equivalence class of [E] form a connected subgraph of T. If T is 
trivalent, then this subgraph is either a single edge or a trivalcnt tree. 

(ii) If each inner node of T has degree at least two in T, then all the equivalence 
classes in [E] are just single edges. If each inner node has degree at least three in T, then 
all equivalence classes in [E \ E] are single edges. 

(iii) The edges in any equivalence class in [E \ E] can be ordered so that they form 
a path in T. 

(iv) Every connected component of T is either a single node or a tree with its set of 
leaves contained in [n\. 

Lemma 5.3. Let E(uv) C E be any path as in Remark 5.2(\vl), which is an element of 
[E\E]. Then the quantities ^ v and rfa v are constant on fix and non-zero. It is possible 
to determine their values from p. 

Proof. First note that the degree of each inner node on the path between u and v in T 
must be exactly two. Moreover, the degree of both u and v in T must be at least three 
unless u or v is a leaf. Consider the case when both u and v are inner nodes of T . In 
this case, these nodes have degrees at least three in T and we can find four leaves i, j, k, I 
such that u separates i from j in T, v separates k and I and {u,v} separates {i,j} from 
{k 7 l} as in the graph below. 



Furthermore, by construction, fay, faki, faik, faji are all non-zero. Consider the marginal 
models for T(ijk) and T(ikl). By Corollary 2.2, these are equivalent to models associated 
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with tripod trees as in Figure 1. Hence, from (3) we have that 

> 2 ft 2 



Mm — ^9 a~~- — ~ — i Mm — ^9 ^ ^ ; — ; — ■ (28) 

Pijk + 4 MuMifeMjfc Mifc/ + 4 MifcM«Mfci 



These equations are well defined since fiijfiikfi'jk > and fiikfj-ufiki > 0. Consider the 
quantity t lkl ? Jl and substitute (27) for each of the terms. A simple rearrangement now 

fJ>i j f-^kl 

gives that 

ikkfai _ 2 / \ 

l— ~ 2 

where rj UtV (oj) = U.[ w , W ')£E(uv) Vw,^- Therefore, substituting for p% us- 

ing (28) implies that n\ v is constant on fix and non-zero. Its value can be determined 
as a function of p. Also the value of is constant since = ^(1 — Mu) 2 ^ v 

If either u or w is a leaf of T, then the argument is very similar. Thus, if u is a leaf, 
then consider any two leaves i,j of T such that u separates u,i,j in T. In particular, as 
in (28), 

_2 t^uij 

t^v — . '2 ~. ~ ~ ~ • 

I'.iij + ' /'«'/•' "jl'ij 

Moreover, rj u , v (u>) must be determined, since from (27) 

AuiAwj l/i -2\ 2 / \ 

from which it follows that „ has to be constant on the p-fiber. □ 

The following theorem shows that the geometry of the p-fiber Qt is determined by the 
zeros of the covariance matrix E. 

Theorem 5.4 (The geometry of the p-fiber the smooth case). Let p G M.t- If 
each of the inner nodes of T has degree at least three in the p-forest T , then the p-fiber 
is a finite set of points of cardinality 2'^'~ Tl . If each of the inner nodes of T has degree 
at least two in T , then the p-fiber is diffeomorphic to a disjoint union of polyhedra. In 
particular, it is a manifold with corners. Its dimension is ll-i, where I2 is the number of 
degree-2 nodes in T. 

The proof is given in Appendix C. 

If T is trivalent, then the p-fiber is finite if and only if for all i,j G [n] fJ-ij 7^ 0. The 
proof of Theorem 5.4 provides explicit formulae for the parameters in this case when the 
p-fiber is a finite number of points. 

Corollary 5.5. Let T be a tree such that each inner node has degree at least three and 
let p G M.t ■ Consider the p-forest T. If every inner node of T has degree at least three 
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in T, then, by Remark 5.2(H), both [E] and [E\E] consist of singletons. In this case, 
every point in the p-fiber satisfies 

pi=pi for alii = 1,..., n, 
i]u,v = for all (u, v) € [E]. 

Moreover, for any inner node v of T , if i,j,k € [n] are any three leaves separated by v 
in T such that pijpikpjk 0, then 



frills 



for any terminal edge (v, i) G E\E, where v is an inner node and i € [n] is a leaf of T. 
Let j,k be any two leaves such that v separates i,j,k and pjk 0. Then 



2 _ Pfjk + 4p.ijp.ikp.jk 



»3k 

Moreover, for any inner edge (u, v) G E\E let k, I 6 [n] be any four leaves of T such 
that u separates i and j in T, v separates j and k in T and (u,v) separates {i,j} from 
{fc, 1} in T . Then 



tfi ttjk + 4pijpikpjk 
frii P-iki + ^ikpupki ' 



Remark 5.6. The choice of signs of the /x„ and rj UtV in Corollary 5.5 is not completely 
free and has to be consistent with signs of tree cumulants via (22) (see Appendix D). 

The singular case when there is at least one degree-zero inner node is more complicated. 
We begin with an example. 

Example 5. 7. Let T = (V, E) be the tripod tree rooted in the inner node as in Figure 1 
and let p € M.t- The degree of h in the p-forest T is less than two if and only if /ty = 
for all i j = 1,2,3. In this situation, E — E and the p-fiber fix is given as a subset 
of Qt by equations for the sample means pi = pi for z = 1,2,3 together with the three 
additional equations 

(1 -pl)Vh,iVh,2 =0, (1 - pl)vh,iVh,3 = 0, (l-pl)Vh,2Vh,3=0- 

Geometrically, in the subspace given by pi = pi for i = 1,2,3, this is a union of two 
three-dimensional hyperplanes {ph = ±1} and three planes given by {rjh.i = r\h.i = 0}, 
{Vh,i = Vh,3 = 0} and {r/h,2 = Vh,3 = 0} subject to the additional inequality constraints 
defining £It and given by (26). In particular, it is not a regular set since it has self- 
intersection points given by 1 — p\ = r]h,i = Vh.2 = Vh.3 = 0. 

This geometry is mirrored in the general case. We first need two definitions. We say 
that a node v £ V is non- degenerate (with respect to p) if either v is a leaf of T or 
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Figure 3. The quartet tree. 



degw > 2 in T. Otherwise, we say that the node is degenerate with respect to p. The set 
of all nodes that are degenerate with respect to p is denoted by V. By Lemma 5.1. for 
all v e V \ V, Var(K„) =^ 0, where the variance is computed with respect to p. Hence v is 
non-degenerate if and only if Y v is a non-degenerate random variable. 
We define the deepest singularity of Qt as 

f) dcop := {lu e n T : i] u . v =0,p,l = l for all (u, v) <E E, v G V}. (30) 

Theorem 5.8 (The geometry of the p-flber the singular case). If V is non- 
empty, then the p-fiber is a singular variety given as a union of intersecting smooth 
manifolds in M.\ V \+\ E \ restricted to fix- Their common intersection locus restricted to Qt 
is given by Qdccp, which lies on the boundary o/f^T- 

The proof is given in Appendix C. 



6. Example: The quartet tree model 

In this section, we study the first non-trivial example: the quartet tree model given 
by the tree in Figure 3. The model is parametrized as in (7) by the root distribution 
and conditional probabilities attached to each of the edges. We set the values of the 
parameters to = 0.8, 0$ = 0.8, (9^ = 0.3, 0^ = 0.7, 0^ = 0.3, 0^ = 0.8, 0^ = 

0.3, 0^ = 0.7, 9^1 = 0.3, 0$ = 0.7, 0^ = 0.3. Using (7) we can then calculate the 
corresponding probabilities over the observed nodes that are given in the third column 
in the table below. The change of coordinates f p \ presented in Appendix A.l and f^ K 
in Section 3.2 gives the corresponding non-central moments and tree cumulants that are 
shown in Table 1. Formula (21) enables us to calculate the values for the new parameters 
as: ri ryl = 0.5, r) r> 2 = 0.4, r] r%a = 0.5, ?? a , 3 = 0.4, r) aA = 0.4 and pi = -0.4, p 2 = -0.24, 
ps = —0.16, p4 = —0.16, p r = —0.6, p a = —0.4. It is easy to verify that (22) holds in this 
example. For instance, 

«1234 = |(1 - Pl)prpa''lr,l , r]r,2rir,a'na,3lla,4 = 0.0006, 

which equates with the value in the table. In general, higher-order tree cumulants tend 
to be very small. 

If we have only tree cumulants K e A4^, we can still identify the parameters of the 
model up to the label switching on the inner nodes using Corollary 5.5. Recall that if 
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Table 1. Moments and tree cumulants 
for a probability assignment in Air 



a 


I 


P. 


It 


A 


i 






0000 








,0444 


1 


,0000 







0001 


4 





,0307 





,5800 







0010 


3 





,0307 





,5800 







0011 


34 





,0403 





,3700 





0336 


0100 


2 





,0346 





,6200 







0101 


24 





0323 





,3724 





0128 


0110 


23 





0323 





,3724 





,0128 


0111 


234 





,0547 





,2422 


-0 


,0020 


1000 


1 


0. 


,0482 





,7000 







1001 


14 





0491 





,4220 


0, 


0160 


1010 


13 


0. 


0491 





,4220 


0, 


0160 


1011 


134 


0. 


,0875 





,2750 


-0. 


,0026 


1100 


12 





,0828 





,4660 





,0320 


1101 


124 





,0979 





,2853 


-0. 


,0038 


1110 


123 





,0979 





,2853 


-0 


,0038 


1111 


1234 





,1875 





,1875 





,0006 



|/| < 3, then kj = \i\ so, for example, 



fi= ^123 =0 . 36 

Ml23 +4^12^13^23 

2 _ ^123 + 4^12^13^23 _ n or - 

Vr.l — 2 — U ' Z0 ) 

^23 

2 _ Ml4 Ml23+ 4 M12M13M23 _ „ „ 
Ml2 «34 + 4/X13M14M34 



Note that the entries in Table 1 can be computed in several different ways. However, by 
Corollary 5.5 this does not matter. For instance, to compute p, r we picked 1,2,3 as three 
leaves separated by r. If, instead of 1,2,3, we used 1,2,4, the answer would be the same 
since 

-2 ^24 =0 36 

HI 24 + 4/ii2(Ui4At24 

Finally, in Appendix D we show that in this case we have exactly four possible distinct 
choices for combinations of signs of these parameters. The first one is the original one 
with all r] UtV > 0, which we denote by ui: 

J7 r ,i = 0.5, r] r ,2 = 0.4, r] rta = 0.5, ria^ = 0.4, rj aA = 0.4, 
p, r = -0.6, Jx a = -0.4, 
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where we omit pi for % = 1, 2, 3, 4 since these are constant for all points in Qt- We obtain 
three remaining points by using local sign switching as defined in Appendix D, which are 
(Vr,l,Vr,2,Vr,a,Va,3,Va,4,,Pr,P-a) = (-0.5,-0.4,-0.5,0.4,0.4,0.6,-0.4) or (0.5,0.4,-0.5, 
-0.4, -0.4, -0.6,0.4) or (-0.5, -0.4,0.5, -0.4, -0.4, -0.6, -0.4). 

7. Discussion 

The reparametrization of Bayesian tree models with hidden variables given herein has 
illuminated the structure of these tree models and has enabled us to establish some 
idcntifiability results. However, the applicability of the new coordinate system reaches 
far beyond understanding idcntifiability. Some additional results will be presented in 
forthcoming papers where we generalize both results of [2] and [15], obtaining the full 
semi-algebraic description of this model class, and results of [13], on the asymptotic 
approximation of the marginal likelihood integrals. 

The results given here can be extended in a straightforward way to the case when all 
hidden variables are binary but all leaf variables are arbitrary. It is less clear how the 
methods extend to tree models for arbitrary finite discrete random variables, or more 
generally, to other discrete graphical models. However, the extension to Gaussian models 
on trees appears to be straightforward. 

The definition of tree cumulants in (17) can be generalized using other posets than Hp- 
This opens many interesting possibilities to investigate more general coordinate systems 
for binary models. They all share certain useful properties of classical cumulants. In 
particular, Lemma 3.2 is true if the poset of tree partitions is replaced by any other 
lattice of partitions. We will report on this result in a forthcoming paper. 

Appendix A: Change of coordinates 
A.l. Prom probabilities to central moments 

Let A2"_i be the set of all possible probability distributions of a binary vector X = 
(X\, . . . ,X n ) as defined in (6). Let C n be the set of all possible central moments fii for 
/ G [n]>2 and means Ai, . . . , A„. In this section, we show that there exists a polynomial 
isomorphism between A2«_i and C n . 

First, perform a change of coordinates from the raw probabilities p = [p a ] to the 
non-central moments A = [A Q ] for a = (ai, . . . ,a„) S {0,1}". This is a linear map 
f p x : M 2 — > M 2 , where A = f p \{p) is defined as follows: 

K= PP forany ae{0,l}™, (31) 

a</3<l 

where 1 denotes the vector of ones and the sum is over all binary vectors (3 such that 
a < j3 < 1 in the sense that ai < fii < 1 for all i = 1, . . . ,n. In particular, Ao = 1 for 
all probability distributions. Therefore, the image C n = / p a(A2"-i) is contained in the 
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hyperplane defined by Ao = 1. The map, f p \ : A2«_i — > C n , is invertible and hence we can 
obtain coordinates on L n given by A Q for all a <E {0, 1}™ such that a^O. The inverse 
of f p x is the map, f\ p = : C„ ->■ A 2 »_i, and is given by 

p a = (-l) l/S_a| ^ for a= (ai,...,a„)G {0,1}". (32) 

a<£<l 

The linearity of the expectation implies that the central moments can be expressed in 
terms of non-central moments. In particular, 

n 

A'a = (- 1 ) l ^ lA «-^II A S for a G {0,1}", (33) 

0</3<a t=l 

where |/?| = ^ i Pi. Using these equations, we can transform variables from the non-central 
moments [A a ] to another set of variables given by all the means A ei , . . . , A e „ , where 
ei,...,e„ are standard basis vectors in R", and central moments \p, a ] for a 6 {0,1}". 
The polynomial mapping /a m : R 2 — > R" x R 2 is the identity on the first n variables 
corresponding to the means A ei , . . . , A e „ and is defined by (33) on the remaining variables. 
The image of /a m is contained in the subspace T-L C R™ x R 2 given by fx ei = ■ ■■ = [i en = 0. 
It is easy to show (see, e.g., equation (5), [3]) that the inverse of /a m :R 2 — > H is given 
as Ux = / A " x : % — > R 2 " defined by 

n 

A Q = ^p]l X e: for a e {0,1}". (34) 

0</3<a i=l 

Let C n denote f\[j,(C n ). Then C n is contained in TL and /io = 1. We have, therefore, 
obtained coordinates of C„ given by A ei , . . . , X erl together with \x a for all a G {0, 1}" such 
that \a\ > 2. 



A. 2. A reparametrization for general Markov models 

Let T = (V, E) be a rooted tree with n leaves and root r. Note that for a tree 1 + 
2|_E| = |V"| + |i?| so the number of free parameters in (5) and (7) is \V\ + \E\. We define 
a polynomial map fg u : Rl^l+I^l — ► IRl^l+l^l from the original set of parameters of 0t 
given by the root distribution and the conditional probabilities for each of the edges to 
a set of parameters given as follows: 

f] u v = 6y{l — 6>j"p for each (u, v) S E and 

(35) 

jLt„ = 1 — 2A„ for each v S V, 

where A„ = KY V is a polynomial in the original parameters 9 of degree depending on the 
path from the root to v. Let (r, v\, . . . , Vk,v) be a directed path in T. Then 

K= Y Q {Vk ) ■■■0i r) - 

a£{0,l} k + 1 
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Let VLt = /so;(0t)- The inverse map f u $ :J7t — > ©t has the following form. For each 
edge (u, v) € E we have 



(„) 1 — p v 1 + /i M 

y i|i ~ 

aild e^ = ±f^. 



(v) _ 1 — Pv m 1 — Mm 

(36) 



'i|o Vu,v 2 



A. 3. The non-degenerate case 

In this section, we derive the submodel of M 1 ^ = ipri^r), defined as the image of tpx con- 
strained to the subset fiy of fix given by p% < 1 for all v G V. Wc define a smooth trans- 
formation on Clj, that enables us to change coordinates from ((/2„), (t]u,v)) to ((p v ), (Puv)), 
where 



2Pv /1-/4 /o 7 N 

P» = 7= => Puv = \h, rjJ7u^. (37) 



It is easily checked that this map is invcrtible since 



Pv i±J% fiQ\ 

Pv — —?===, rju,v — \ , . _ 2 Puv (oSJ 

The inequality constraints defining Q,^ are given by (26) and the fact that p v G (—1,1) 
for all v G V . To express this in terms of the new coordinates, let t v be defined by 



2 

Pi' \ , Pv 



tv = \j 1+ {2j + Y e{0 ' oo) - (39) 

Then (26) becomes 

tu^v — Puv — 7 ) 

(40) 

1 t„ 
-7— < < -r- 

Transform the tree cumulants to a new coordinate system given by p\ , . . . , p n and 

Pl = 7 =4== for all I G N> 2 , (41) 

so that pij is the correlation between Xi and X,-. The change of coordinates on Q,^, 
and Kt induces a new parametrization of M.^- The parametrization is given by the 
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identity on the first n coordinates corresponding to pi for i = 1, . . . , n and 

9i= II Pli es(v) ~ 2 II foraU JG[n]> 2 . (42) 

vev(i)\i (u,»)efi(7) 

In particular, each pi has an attractive monomial form. To prove (42), simply substi- 
tute (38) and (41) into (22) to obtain 



n i i n ( a* \ desv - 2 n h+Pi 

or, equivalcntly, 

pi= n ^ eg "" 2 n 

vGV(I)\I (u,v)£E(I) 



<W4v>/*+^ J v l J ^Pl 



Next, we show that the term in the second line of the equation above is equal to one. 
This follows from the fact that every v £ V(I) apart from the root is a parent of exactly 
deg(u) — 1 nodes and has one parent; the root has no parents and is a parent of deg(r(/)) 
nodes. 



Appendix B: Proof of Proposition 4.1 

It suffices to prove (22) for / = [n] because the general result for / C [n] obviously follows 
by restriction to the subtree T(J) since each inner node of T(J) has degree at most three. 
The proof proceeds by induction with respect to the number of leaves of T. First, we 
show that the result is true for n = 2. Since by definition K12 = P12 we need to prove 
that 

where r is the root of T. If any of the nodes of V represents a degenerate random variable, 
then the global Markov properties in (4) imply that X\ _LL In this case, the left-hand 
side of (43) is zero. However, as we show next, one of the factors on the right-hand side 
of (43) must vanish as well. We prove this by contradiction. Suppose that both p%. ^ 1 and 
Vu.v 7^ for all (it, v) G E. By Remark 4.3, this implies that all the nodes of T represent 
non-degenerate random variables, which leads to contradiction. 

So assume now that every random variable in the system is non-degenerate. From (12), 
by taking I = {1}, J = {2}, we have 



P12 = 2(1 ~ fif)Vr,lVr,2 
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so it suffices to show that 

(1 - f4)Vr,i = (1 - fir) Vu,v and 

C«,«) 6B (rl) (44) 

(1 ~fir)Vr,2 = (1 - fir) JJ 

(«,v)eB(r2) 

If r = 1 or r is a parent of 1, then the first equation in (44) is trivially satisfied. Assume 
that the length of the path between r and 1 is greater than one. Let (r, h m , h m —i, . . . , hi, 1) 
be the directed path E(rl) joining r with 1. Then, because Y r _LL YilY^, by (12) we have 
that 

|(1 - fil)T]r,l =Hrl = |(1 - fiijVhurVhiA- ( 45 ) 

Similarly, because Y r AL Yh k \Yh k+1 for each k = 1, . . . ,m — 1, then again by (12) 

K 1 - fih k )Vh k ,r = j(l ~ fih k+1 )Vh k+1 ,rVh k+1 ,h k - 

Substituting this expression for all subsequent fc = 1, . . . , m — 1 into (45) we can now 
conclude that 

1(1 - fir)Vr,l = |(1 -Mfc m )% m ,r% m ,h m _i •••% 3 ,fci%i,l. (46) 

But since ^(1 — fi\ m )r)h m ,r = firh m = j(l — fi^) r lr.h m , equation (46) implies that 

{l-fij)Vr,i = {l-fij) n Vu,v (47) 

The second equation in (44) is proved simply by changing the index from 1 to 2 above. 

Now assume the proposition is true for all k < n — 1 and let T be a tree with n leaves. 
If one of the inner nodes of T is degenerate, then by the global Markov properties in (4) 
there exists an edge split C\ \ Ci of the set of leaves such that Xc x iL Xc 2 ■ The left-hand 
side is zero by Lemma 3.2. Again, by Remark 4.3, if both jj% ^ 1 and rj UtV ^ for all 
(u, v) £ E, then fi\ ^ 1 for all v e V. Hence, on the right-hand side of equation (43), 
either p% = 1 or one of the rj UtV vanishes. Consequently, (43) is satisfied. 

We assume now that all the inner nodes of T represent non-degenerate random vari- 
ables. As n > 3, we can always find two leaves separated from all the other leaves by 
an inner node. We shall call such a pair an extended cherry. Denote the leaves by 1,2 
and the inner node by a. Let A = {3, . . . , n} and let T{aA) be the minimal subtree of T 
spanned a U A. Note that the global Markov properties in (4) give that, for each C C A, 
we have (Xi,X2) ALXc\H a . Using (12), we can conclude that 

= ^12MC + |(1 - fil)Va,12Va,C = Ml2^C + ?7a,12MaC- (48) 

Let e S E be the edge incident with a separating 1 and 2 from all other leaves, that 
is, such that e induces the split v = 12 1 1^4 . For each 7r € IIt, if 7r is induced by removing 
E^ C E, then it A v is induced by removing E„ U e. Let p = 12|0a S IIt- Since {1,2} forms 
an extended cherry and all the inner nodes of T have degree at most three, it follows 
that a necessarily has degree three in T and is a leaf of T(aA). The trimming map with 
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respect to {1, 2} is the map [p, i] — > TlT(aA) such that 7r i — >• 7r is defined by changing the 
block 12C in 7r G [p, 1] to aC. Note that the trimming map constitutes an isomorphism 
of posets between [p, 1] and TlT(aA) ■ 

It follows from the definition of tree cumulants in (17) that 

Kl... n = E m(7T, 1) J| /i S + ^ m(7T,l) IJ ( 49 ) 

The second summand in (49) is zero since every it € Ht such that ir ^ [p, I] necessarily 
contains either 1 or 2 as one of the blocks and pi = p2 = 0. Applying (48) to each p\2G 
for each ir £ [p, 1] , we obtain 

J PB = PB+ Va.12 JJ Pb 

Ben: Beir/w BEtt 

and hence 

Kl... n = E m(7T, 1) PB+VaS2 E m ( 7F ' ^) XT - (50) 

ttg[p4] BeivAis web,!] 

The first summand in (50) can be rewritten as 

e [( e n^l- ( si ) 

However, from Lemma 3.1. since u^i, for each S the sum 5^ttai/=5 Trl ( 7r ' -0 m (^1) ^ s 
zero. It follows that 

Kl-n = ?7a,12 E m ( 7r ' 1) II / J -B ' 
7rG[p,l] 

By Proposition 4 in [11], the Mobius function of [p, 1] is equal to the restriction of 
the Mobius function on II ^ to the interval [p, 1] . The trimming map constitutes an 
isomorphism between [p, 1] and Ht(ciA). Consequently, the Mobius function on [p, 1] is 
equal to the Mobius function on Ii-T( a A) - It follows that 

= fla,12 ( E m ( 7r 'l)IlA t B 

7re[p,i] -Be 5 ? 

= %,12( E ^aA^i laA) H PB J = ?7a,12^aA- 

Ven T(oA) bgtt ' 

Since X\ ]LX2\H a , by the second equation in Proposition 2.5, r\ a ,\2 = PaVa,iVa.2- Since 
\aA\ = n — 1, by using the induction assumption 



v£V(aA)\aA (u,v)eE(aA) 
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where the degree is taken in T(aA). We have two possible scenarios: either r(aA) or 
r(aA) = a. In the first case, r(al) = r(a2) = a and by (47) 

T\a,\r\a.2 = Vu,v 



and hence 



Ma Vu,v)KaA- (52) 

(M,u)eB(i2) 



In the second case, either r(al) = a and r(a2) = r or r(al) = r and r(a2) — a and so 

^ a (u,-u)eE(12) 



Hence, 

' 1 - Mr 



1 - u 2 



»7u,t) ) «a.A- (53) 



The degree of a in T is three and the degree of all the other inner nodes of T(12) is two. 
Moreover, E = E{aA) U £(12) and V \ [n] = (V(aA) \ oA) U (V(12) \ {1,2}). It follows 
that both (52) and (53) satisfy (22). 



Appendix C: Proofs of the theorems 



Proof of Theorem 5.4. If each inner node of T has degree at least three in T, then for 
each inner node u it is possible to find k £ [n] separated by u in T. So flijfxihlJ'jk 0- 
Thus, by (28), we can determine all values M« = A« 7^ 1- Since, by Remark 5.2(h), all the 
equivalence classes in [E \ E] are just single edges, we can identify all jy 2 . v = fj^ ^ for 
all (u, v) £ E \ E by Lemma 5.3. 

We now show that, because all equivalence classes in [E] are singletons, r) WtW > = 
for every (w,w') G E. By construction, for each (w,w') € E, either both w and w' have 
degrees at least three in T or one of them is a leaf and the other has degree at least three 
in T. Therefore, there exist i, j 6 [n] such that E(ij) P\ E = {(w, w')} by the construction 
of E. We have that fj,^ = 0. However, r) u>v = fj u , v ^ for all (u,v) e E\ E. Because 
P'rUj) = fiw-Uj) ^ •"•> ^ follows by (27) that i] w>w i = 0. Therefore, the values of all the 
parameters are fixed up to signs and in this case J7t is finite. The proof that there are 
exactly 2' l/ '~" points in this fiber is provided in Appendix D. 

To prove the second statement of Theorem 5.4, first note that, since every inner node 
of T has degree at least two in T, it follows by Lemma 5.1 that for each v G V, p% < 1. 
This implies that the p-fiber lies in f2S^ C fir as defined in Appendix A. 3. We can apply 
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a smooth transformation over this subset to a second space fl' T C IRl^H- 6 whose coor- 
dinates are given by p v for v G V and p uv for (u, u) G E. The map is defined by (37) and 
is invertible with the inverse defined in (38). 

To investigate the geometry of the p- fiber in Q' T , first list all the defining constraints. 
For all i = 1, . . . ,n we have that pi = pi because p determines the sample means of the 
observed nodes. Hence the value of pi is determined as well. Write pi = pi for all i = 
1, . . . , n, where pi is the image of pi under (37). For each inner node v whose degree in T 
is at least three, we can find i,j, k G [n] separated in T by v. The value of pi is determined 
by (28), which is well defined because pijpikp-jk > 0. Therefore, the value of pi, for each v 

whose degree in T is at least three, is fixed p 2 v = pi, where p 2 v = Y~fp by (37). 

Next, wc show that for every (u,v) G E we must have that p uv = 0. This follows by 
essentially the same argument as in the first part of the proof. Because the degrees of 
both u and v are at least two, there exist i,j G [n] such that E(ij) PI E = {(u,v)}. In 
particular, faj = and so by (27) rj u>v = 0. Moreover, for any path E(kl) in [E \ E] the 
value of p\i is constant by Lemma 5.3. So write pki = phi- By (42), we have that 

Phi = Puv (54) 

{u,v)eE{H) 

Finally, for any degree-two node v the parameter p v can take any real value and each p uv 
is constrained to satisfy (40). This completes the list of constraints defining the image of 
the p-fiber in il' T . 

We now show that this image is diffcomorphic to a union of polyhedra. Let p = 
((/?„), (puv)) be any point in the transformed p-fiber. Then p lies in a linear subspace C 
of Rl^l+I^l given by p uv = for all (u,v) G E. Since p uv ^ for all (u,v) G E\E, 
we can define the following further smooth change of coordinates on C. Let s:E~^- 
{ — 1,0,1} be any possible sign assignment for (p uv ) such that s(u, v) = sgn(p uv ) and 
sgn(pij) — Y[( u v )eE{ij) v ) f° r a ^ J ^ I n ] ( c ^- Appendix D). Then s induces an open 

orthant M s defined by s(u, v)p uv > for all (u, v) G E\E. Moreover, the disjoint union 

of U s = KJ^I x M^^' c C, for all possible sign assignments s, covers the p-fiber, that is, 
each point of the p-fiber lies in one of the U s . Note also that on each U s the sign of p v 
for all nodes of the degree at least three is fixed. This follows from the fact that by (42) 

Pijk = Pv puwi 
(u,w)£E(ijk) 

for any three leaves i,j, k G [n] separated by v in T. Since on each U s the signs of p uw for 
all (u,w) G E(ijk) are fixed, the sign of p v also has to be fixed to match the sign of pijk- 
We write p v = p s v on U s . 

On each U s define a map to the space Rl v 1+I- E \- B l with coordinates given by v uv for 
(u,v) G E\E and z v for v G V. The map is a diffcomorphism defined as follows. We set 

v uv = log(s(u, v)p uv ) for all (u, v) G E \ E. 
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Next, for every v G V we substitute p v for t v as defined in (39). This is an invertible 
transformation because 

_ tl-1 

Pv = — , 

which is well defined since t v > for all v G V. We then simply substitute t v for z v = log t v . 

In this new coordinate system, the p-fiber restricted to U s is a union of polyhcdra. The 
defining constraints are as follows. First, 

Zj = Zi for all leaves i = 1, . . . ,n, 

(55) 

z v = zf, for all v with degree at least three in T. 

Here, Zi,z% are real numbers obtained as images of pi, p v , respectively Moreover, for 
each E(kl) e[E\E] 

^ v uv =\og\p kl \ (56) 

(u,v)£E(kl) 

subject to additional inequality constraints 

v uv < mm{z u - z v ,z v - z u ] if s(u, v) = 1, 

v U v < ruin{z u + z v , — z u — z v } if s(u,v) = — 1, for each (it, f) S \ E and (57) 
z„ > for the inner nodes of degree 2. 

These inequalities follow from (40). Since all these constraints are linear, they define 
a polyhedron in Rl v l+I- E \- B l . Therefore the p-fibcr is a disjoint union of subsets each of 
which is diffeomorphic to a polyhedron. 

To show the dimension of each polyhedron is equal to 2/2, we must ensure that the 
dimension of the smallest affine subspace containing this polyhedron is 2Z2 - Since z v > 
for all v G V it is easily checked that the inequalities in (57) do not induce any equality. 
Therefore, the description of the affine span is obtained from the description of the 
polyhedron (given by (55)-(57)) by suppressing all inequalities in (57). The dimension of 
the ambient space is \V\ + \ E \ E\\ the codimension is given by the number of equations 
in (55) and (56). Hence the codimension is equal to |V| — 12 + \ [E \ E]\. For each E(kl) G 
[E\ E] one has that \E(kl)\ — 1 is equal to the number of degree-two nodes in E(kl). By 
summing over all E(kl) it follows that \E \ E\ — \[E \ E]\ = h- Therefore, the dimension 
of the polyhedron is given by 

(\V\ + \E\E\)-(\V\-l 2 + \[E\E]\) = 2l 2 . 

Since the dimension of the affine span of a polyhedron is equal to its dimension, the 
dimension is equal to 2l 2 as required. □ 

Proof of Theorem 5.8. Let V CV and E Q C E and 

tt(y ,E ) = e M T : p 2 u = 1 for all v G V , T}u,v = for all (u, v) G £ }- (58) 
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We say that (Vq, Eq) is minimal for £ if for every point lu in Q(v ,e ) an d for every i,j G [n] 
such that pLij = we have that ynj{oj) = and furthermore that (Vq, Eq) is minimal with 
such a property (with respect to inclusion on both coordinates) . 

To illustrate the motivation behind this definition, consider the tripod tree singular 
case in Example 5.7. If T is rooted in the inner node, we have four minimal subsets of 

2^x2^: ({h},0), (0,{(M),(M)}), (0,{(M),(M)}) and (0, {(h, 2), (h, 3)}). 
We now show that the p-fiber satisfies 

o T = (J n (FoiJ5o) nfi r . (59) 

(Vb.Bo) min. 

The first inclusion "C" follows from the fact that if uj G Otj then = jlij for all i, 

j G [n]. In particular, n%j{uS) = whenever //^ = 0. Therefore, a; G ^(Vb,s ) ^ f° r 
(Vo,-Eb) minimal. The second inclusion is obvious. 

For each minimal (Vq^Eq) the set Q(y .E ) H f2r is a union of disjoint manifolds 
in IRl^l+l^l constrained to Qt- To show this, consider first all the connected components 
Ti = (Vi, Ei) for i = 1, . . . , k of T except isolated inner nodes of T. By Remark 5.2(iv), all 
these components are trees with a set of leaves contained in [n] . The projection of the 
parameter space £It to the parameters for the marginal model A4^. is denoted by f2j. 
It is therefore a projection of Qt on p, v for u £ Vi and r? U| „ for (u, v) G £?j. By Theo- 
rem 5.4, each component Ti induces a manifold with corners in f2i, denoted by Qi. Hence 
there exists a manifold M, in such that f2, = Mj n fli- The constraints on the 

remaining coordinates are given by: p% = 1 for all i> € Vb and r) UtV = for (it, v) £ £V 
These algebraic equations define a union M(y 0j£ ; ) of affinc subspaces in Rl^l+I^l with 
coordinates given by p, v for v £ V and rj UtV for (it, w) G E. 

For each (Vo, i?o), consider the union of manifolds M C Rl^l+I^ given as the Carte- 
sian product of M(y 0jBo ) and Mj for i = 1, . . . , k. The restriction of M to VLt is exactly 

^(v ,£o) ^ Now we have that 

p| (M (Vo>Eo) x Mi x ■ • • x M fe ) = ( p ^(Vo.so) ) x Mi x ■ • • x M fc . (60) 

(y ,-E )min. ^(Vo,_E )min. ' 

However, n ( y „E o) mi „. M (v ,-Eo) is ccmal to 

{we# l+|Ji| : ^ = 1 for all v GV,r] UiV =0 for all (it, 

where, after the restriction to fix, the intersection in (60) is equal to the deepest singu- 
larity. □ 



Appendix D: Sign patterns for parameters 

Let p G M.t such that each inner node of T has degree at least three in the correspond- 
ing forest T. By the proof of Theorem 5.4, there is a finite number of points 9 G Qt 
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such that /t(0) = P- By definition, this set of points is denoted by Qt- Corollary 5.5 
gives the formulae for the parameters modulo signs, which suggests that \&t\ = 2l^l + l- E L 
However, not all sign choices are possible. Let m be the number of inner nodes of T. We 
will show that the number of possible choices of signs is, in fact, equal to 2 m , that is, 
\Qt\ = 2™. We also show how to obtain all the points in Qt given one of them. This 
construction becomes especially simple when expressed in the new parameters defined 
by (35). 

Let 9 be a point in Qt (Qt is finite and non-empty) and let uj = feui(9). We assign 
signs to each edge of T using the map s : E — > {— 1, 0, 1} such that for every («, v) £ E, 
s(it, v) = sgn(?7 Mi „), where rj UtV are parameters in uj. Let h be an inner node of T. On Qt 
we define the operation of local sign switching Sh such that 5h{u>) = u>' where rj' u v = —rj u .v 
if one of the ends of (u, v) is in h ctnd Tj f u v — t]u,v 

otherwise; fi' h — — \ih and y! v = fi v for 
all v =^= h. We have that = p,i and hence A£ = for all leaves i = l,...,n. Let now 
l£ N> 2 - Then, from (22), 

=j(i -#(0) n (^) dcsM - 2 n <f 

«ev(/)\i (ti,t))£B{/) 
We have two cases: either h lies in V(7) or not. In the first case, 

Kl(w') = (-l) dcg(h) - 2 (-l) d0s( ' l) K/ (w) = Kl(u). 

In the second case, u>' = uj and hence trivially ki(w') = kj(lu). It follows that u/ G f^T 
and therefore the operator Sh : £It —> is well defined. The local sign switchings form 
a group Q that is isomorphic to the multiplicative group Z™ . By composing distinct 
local switchings we obtain 2™ different points in VLt- Hence the orbit of u> in Qt has 
exactly 2 m elements. 

It remains to show that there are no other orbits of Q in Ot- Let u G Qt and let uj' 
be a point in Qt such that (r]' u v ) 2 = r) u ^ for all (it, v) £ E and (p.^) 2 = fi? v for all inner 
nodes v of T, which is a necessary condition for uj' to be in f^. Assume that uj' is not 
in the orbit of uj. We will show below that this implies that uj' cannot lie in the p-fiber. 
It will then follow that the orbit of uj constitutes the whole Qt and hence \Qt \ = 2™. 

We proceed by contradiction. Thus, let uj' £ Qt and we want to show that uj' = 8(ui) for 
some 5 € Q. Since ui can be replaced by any other point in its orbit, we can assume that 
sgn(/2„) = sgn(^) for all v £ V. Since uj,uj' £ Qt, for every i,j,k £ [n] by (22) applied 
for Kij and Kijk, respectively, we have that 

II s ( u , v )= n s'(u,v), s(u,v)= s'(u,v). 

(u,v)EE(ij) (u,v)£E(ij) (u,v)GE(ijk) (u,v)£E(ijk) 

It follows that II(u,o)eE(t>») s ("' v ) = H(u,v)eE(vi) s '( u ' v ) for cacn inncr nodc v anci leaf *• 
It immediately implies that s(u, v) = s'(u, v) for all (u, v) £ E and hence uj = uj' . In this 
way we have shown that uj' is in the orbit of uj under Q. 
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